Defining the Load Factor ($\lambda$)
The Load Factor ($\lambda$) is the central metric determining the real-world efficiency of a hash table. It quantifies how full the table is, directly correlating to the probability and length of collisions.
$$ \lambda = \frac{\text{Number of Items Stored } (N)}{\text{Size of Table/Buckets } (M)} $$- Separate Chaining: $\lambda$ represents the average length of the linked lists (chains).
- Open Addressing (Probing): $\lambda$ represents the table saturation, directly correlating to the expected number of probes required to find an empty slot or a specific item.
Load Factor & Complexity
To achieve the theoretical $O(1)$ average case performance for Search, Insert, and Delete operations, the Load Factor ($\lambda$) must be constant and generally small (e.g., $< 1.0$ or $< 0.75$).
| Performance Factor | Average Case (Target $\lambda$) | Worst Case (High $\lambda$) |
|---|---|---|
| Search/Insert Time | $O(1 + \lambda) \approx O(1)$ | $O(N)$ (Approaches linear search) |
| Collision Rate | Low/Manageable | High |
| Space Overhead | $O(N + M)$ | $O(N + M)$ |
Key Implication: If $M$ is fixed and $N$ grows, $\lambda$ increases, degrading performance toward $O(N)$.
Strategy: Managing Saturation (Re-hashing)
To prevent performance degradation caused by high $\lambda$, efficient hash tables implement Re-hashing (resizing).
- Threshold Monitoring: The system monitors $\lambda$. If it exceeds a predefined threshold (e.g., 0.75 for Open Addressing, or 2.0 for Chaining), resizing is triggered.
- Expansion: The table size $M$ is increased (e.g., doubled to $M'$).
- Redistribution: All existing $N$ items are re-inserted into the new, larger table $M'$, using a potentially new hash function (mod $M'$).
Result: $\lambda$ immediately drops, restoring the structure's $O(1)$ average performance guarantee. Note that the re-hashing operation itself temporarily costs $O(N)$.
1. What is the primary purpose of re-hashing a hash table?
2. In a hash table using Separate Chaining, if $N=20$ items are stored in $M=10$ buckets, what does the load factor $\lambda=2$ represent?
3. A system uses an Open Addressing hash table with $M=10,000$ slots. It stores $N=1,500$ items. What is the expected average time complexity for a search?
4. Although a single re-hash operation costs $O(N)$, why is the amortized cost of insertion still considered $O(1)$?